A Survey of Predictive Modelling under Imbalanced Distributions

نویسندگان

  • Paula Branco
  • Luís Torgo
  • Rita P. Ribeiro
چکیده

Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the target variable. Frequently, the least common values of this target variable are associated with events that are highly relevant for end users (e.g. fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which when associated with the rarity of some of them on the available training data creates serious problems to predictive modelling techniques. This paper presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey we discuss the main challenges raised by imbalanced distributions, describe the main approaches to these problems, propose a taxonomy of these methods and refer to some related problems within predictive modelling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Customer Online Shopping Adoption - an Evaluation of Data Mining and Market Modelling Approaches

Accurate prediction of shopping channel preferences has become an important issue for retailers seeking to maximize customer loyalty. In data mining, novel approaches such as neural networks (NN) have been proposed to predict the probability of class memberships in addition to statistical methods from marketing modelling. However, Data Mining suggests new approaches to data preprocessing in ord...

متن کامل

Comparing Discriminant Analysis, Ecological Niche Factor Analysis and Logistic Regression Methods for Geographic Distribution Modelling of Eurotia ceratoides (L.) C. A. Mey

Eurotia ceratoides (L.) C. A. Mey is an important plant species in semi-arid landsin Iran. New approaches are required to determine the distribution of this plant species. Forthis reason, geographical distributions of Eurotia ceratoides were assessed using threedifferent models including: Multiple Discriminant Analysis (MDA), Ecological Niche FactorAnalysis (ENFA) and Logistic Regression (LR). ...

متن کامل

Model Predictive Control of Distributed Energy Resources with Predictive Set-Points for Grid-Connected Operation

This paper proposes an MPC - based (model predictive control) scheme to control active and reactive powers of DERs (distributed energy resources) in a grid - connected mode (either through a bus with its associated loads as a PCC (point of common coupling) or an MG (micro - grid)). DER may be a DG (distributed generation) or an ESS (energy storage system). In the proposed scheme, the set - poin...

متن کامل

Polichotomies on Imbalanced Domains by One-per-Class Compensated Reconstruction Rule

A key issue in machine learning is the ability to cope with recognition problems where one or more classes are under-represented with respect to the others. Indeed, traditional algorithms fail under class imbalanced distribution resulting in low predictive accuracy over the minority classes. While large literature exists on binary imbalanced tasks, few researches exist for multiclass learning. ...

متن کامل

A Prediction for Classification of Highly Imbalanced Medical Dataset Using Databoost.IM with SVM

Recently, Class imbalance problems have growing interest because of their classification difficulty caused by the imbalanced class distributions. In particular, many ensemble learning and machine learning methods have been proposed for classification of imbalance problem. However, these methods producing poor predictive accuracy of classification for two-class imbalanced dataset. In this paper,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1505.01658  شماره 

صفحات  -

تاریخ انتشار 2015